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Abstract. Tikhonov regularization for projected solutions of large-scale ill-posed 
problems is considered. The Golub-Kahan iterative bidiagonalization is used to project 
the problem onto a subspace and regularization then applied to find a subspace 
approximation to the full problem. Determination of the regularization parameter 
for the projected problem by unbiased predictive risk estimation, generalized cross 
validation and discrepancy principle techniques is investigated. It is shown that 
the regularized parameter obtained by the unbiased predictive risk estimator can 
provide a good estimate for that to be used for a full problem which is moderately to 
severely ill-posed. A similar analysis provides the weight parameter for the weighted 
generalized cross validation such that the approach is also useful in these cases, and 
also explains why the generalized cross validation without weighting is not always 
useful. All results are independent of whether systems are over or underdetermined. 
Numerical simulations for standard one dimensional test problems and two dimensional 
data, for both image restoration and tomographic image reconstruction, support the 
analysis and validate the techniques. The size of the projected problem is found using 
an extension of a noise revealing function for the projected problem Hnetynkova, 
Plesinger, and Strakos, [BIT Numerical Mathematics 49 (2009), 4 pp. 669-696.]. 
Furthermore, an iteratively reweighted regularization approach for edge preserving 
regularization is extended for projected systems, providing stabilization of the solutions 
of the projected systems and reducing dependence on the determination of the size of 
the projected subspace. 
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The solution of the numerically ill-posed linear system of equations 


b = Ax ex + 77, beF, x G IZ n , A e IZ mxn , ( 1 ) 

for matrix A of large dimension with m > n or m < n is considered. Matrix A is 
ill-conditioned; the singular values of A decay exponentially to zero, or to the limits of 
the numerical precision. Noise in the data is represented by 77 e 7 Z m , i.e. b = b ex + 77 
for exact but unknown data b ex that satisfies b ex = Ax ex for unknown exact model 
parameters x ex . Components ry of 77 are assumed to be independently sampled from a 
Gaussian distribution with mean 0 and covariance s'f. Given A and b an estimate for x 
that predicts b cx is desired. 

Discrete ill-posed problems of the form ([Tj) may be obtained by discretizing linear 
ill-posed problems such as Fredholm integral equations of the first kind and arise in 
many research areas including image deblurring, geophysics, etc. Due to the presence 
of the noise in the data and the ill-conditioning of A regularization is needed in order to 
obtain an estimate for x approximating x ex . Standard Tikhonov regularization provides 

x(a) = argmhi{|| Wrj(Ax - b)||| + a 2 ||D(x - x apr )|||}, (2) 

for weighted data fidelity term \\Wq(Ax.— b) ||| and regularization term ||D(x — x apr )|||. 
D is a regularization matrix, assumed here to be invertible, and x apr allows specification 
of a given reference vector of prior information for x. The unknown regularization 
parameter a trades-off between the data fidelity and regularization terms. The noise 
in the measurements b is whitened when Wq = Cq 7 for the covariance matrix 
Cq = diag(s 2 ,..., s^J. Introducing b = Wqh, A = WqA, shifting by the prior 
information through y = x — x apr , and assuming that the null spaces of A and D 
do not intersect, yields 

y(a) = argmin{||iy-r|| 2 + a 2 ||Dy|| 2 }, r = (b - ix apr ) ( 3 ) 

y 

= ( A t A + a 2 D T D)- 1 A T r. 


Analytically when D is invertible, which is not always the case, we may write 
(A T A + a 2 D T D) = D t ((D t )- 1 A t AD ~ 1 + a 2 I n )D. 


Thus when it is feasible to calculate D _1 , or to solve systems of equations defined by 
invertible D, it is convenient to introduce the right preconditioned matrix A = AD ^ 1 
and regularized inverse A\a) = [A T A + a 2 / n ) -1 A T f which provides 


z(a) : = argmin{||Az — r|| 2 + cv 2 1 | z |||}, z(a) — Dy(a), and ( 4 ) 

Z 

x(a) = x apr + y(a) = x apr + D~ l A\a)r. ( 5 ) 


f Note that we use in general the notation Al (a) for the pseudo inverse of the augmented matrix 
[A-,aI]. 
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Although equivalent analytically, numerical techniques to solve 0 and 0 differ. For 
small scale problems, for example, we may solve 0 using the generalized singular value 
decomposition (GSVD), e.g. [20], for the matrix pair [A, D], but would us e the singular 


value decomposition (SVD) of A for 0 ^ e.g. [§], as given in Appendix |Appendix A 
dependent on the feasibility of calculating D _1 . Still, the use of the SVD or GSVD is not 
viable computationally for large scale problems unless the underlying operators possess 
a specific structure. For example, if the underlying system matrix, and associated 
regularization matrix are expressible via Kronecker decompositions, e.g. [13], then the 
GSVD decomposition can be found via the GSVD for each dimension separately. Here 
we consider the general situation and use of iterative Krylov methods to estimate x(a). 


1.1. Numerical solution by the Golub-Kahan bidiagonalization 

In principle iterative methods such as conjugate gradients or other Krylov methods, 
can be employed to solve 0. Results presented in n demonstrate, however, that 
MINRES and GMRES should not be used as regularizing Krylov iterations due to 
the early transfer of noise to the Krylov basis. Here we use the well-known Golub- 
Kahan bidiagonalization (GKB), implemented in the LSQR algorithm, which has been 
well-studied in the context of projected solutions of the least squares problem J2TJ 122] , 
Recently, there has also been some interest in the LSMR modification of LSQR, [I], but 
due to our goal to investigate the regularization parameter a we focus on LSQR for 
which the noise regularizing properties of the iteration are better understood, mm- 
Effectively the GKB projects the solution of the inverse problem to a smaller subspace, 
say of size t. 

Applying t steps of the GKB on matrix A with initial vector b, of norm [5\ = ||b|| 2 , 
and defining ef +1 ^ to be the unit vector of length t + 1 with a 1 in the first entry, lower 
bidiagonal matrix B t G 7 ^( i+1 )xi and column orthonormal matrices H t +i G 7 ^ mx ( t+1 ) ; 
G t G TZ nxt are generated such that, see mm, 

AGt = H t+1 B t , ft H t+1 ef +1) = b. ( 6 ) 

For x t = G t w tl full, rfuu(xt), and projected, r pro j(w t ), residuals are related via 

r fuii( x t) = Ax t - b = AG t w t - ft H t+l ef +1) = (7) 

H t+l B t w t - fti7 m e? +1) = H t+1 (B t w t - fte? +1) ) = H t+1 r proj (w t ), 

for which, by the column orthonormality of H t+1 , 

II Dull ( x t) || 2 = ||r P roj(w t )|||. (8) 

Theoretically, therefore, an estimate for x with respect to a reduced subspace may be 
found by finding w ( and then projecting back to the full problem. Matrix B t in most 
cases, however, inherits the ill-conditioning of the matrix A, [ 21 ], and regularization of 
the projected problem is needed. 
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By the column orthonormality of Gt, we have ||x £ || | = ||||| = ||w t |||. Thus, 
explicitly introducing regularization parameter (, distinct from a in order to emphasize 
regularization on the projected problem, yields the projected Tikhonov problem 

Wf(C) = arg min {||5 t w - /5ie^ +1) ||| + C 2 ||w|||}, (9) 

weK 1 

with solution 

w f (C) = faiBfBt + = ft ^J(C)e? +1) (10) 

= ( GjA T AG t + C 2 It)- l G T t A T b = (^G t ) t (C)b. 


In practice, one uses m to find w t (C) via the SVD for B tl under the assumption that 
t « m* = min(??7,n), noting that an explicit solution for w t is immediately available, 
see e.g. Appendix Appendix A| 

As already observed in [9j p. 302], the regularized LSQR algorithm now poses 
the problem of both detecting the appropriate number of steps t as well as of finding 
the optimal parameter £ opt . One method of regularization is simply to avoid the 
introduction of the regularizer in § and find an optimal t at which to stop the 
iteration. Although it is known that the LSQR iteration is a regularizing iteration, 
it also exhibits a semi-convergence behavior so that eventually regularization is also 
needed. This regularization may be achieved by either picking a in advance, namely 
regularize and project, or by the hybrid approach of regularizing the projected problem, 
e.g. |21 Q31 ng. The problem of first determining the appropriate size t for the projected 
space is discussed in e.g. [HUE] and more recently for large scale geophysical inversion 
in [21]. Although the solutions obtained from the regularize then project, and project 
then regularize, for a given t and a = ( are equivalent, [Id, Theorem 3.1], [9j p 301], 
this does not immediately mean that C opt for the subspace problem provides a opt for the 
full problem, 116 . 


Remark 1 . Determining to which degree certain regularization techniques provide a 
good estimate for a opt from the subspace problem estimate, and the conditions under 
which this will hold, is the topic of this work and is the reason we denote regularization 
parameter on the subspace by ( distinct from a. 


1.2. Regularization parameter estimation 

For the full problem the question of determining an optimal parameter a op t is well- 
studied, see e.g. [TOl [30] . for a discussion of methods including the Morozow discrepancy 
principle (MDP), the L-curve (LC), generalized cross validation (GCV) and unbiased 
predictive risk estimation (UPRE). The use of the MDP, LC and GCV is also widely 
discussed for the projected problem, particularly starting with the work of Kilmer et al, 
[16] and continued in [2j. Further, extensions for windowed regularization, and hence 
multi-parameter regularization, [T] are also applied for the projected problem [3]. Our 
attention is initially on the use of the UPRE. Effectively, the UPRE provides the correct 
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estimate for a opt in the context of a filtered truncated SVD (FTSVD) solution of ([2]) 
with t terms, provided that the LSQR factorization effectively captures the dominant 
right singular subspace of size t for matrix A. This observation does not immediately 
extend to the GCV. Applying a similar analysis as for the UP RE, however, provides a 
choice of the weighting parameter in the weighted GCV (WGCV) introduced in [2j. 

We stress that the approach assumes throughout, both numerically and 
theoretically, that the projected system is calculated with full reorthogonalization, 
a point not made explicit in many discussions, although it is apparent than many 
references implicitly make this assumption. 


1.3. Overview 


The paper is organized as follows. The regularization parameter estimation techniques 
of interest are presented in §|2} The discussion in §[2] is validated with one dimensional 
simulations in §[3| Image restoration problems presented in £j4] illustrate the relevance 
for the two dimensional case. In §4.4 we extend the hybrid approach for use with 
an iteratively reweighted regularizer (IRR), which sharpens edges within the solution, 
[22123I22ES1EDIM], hence demonstrating that edge preserving regularization can be 
applied in the context of regularized LSQR solutions of the least squares problem on 
a projected subspace. Finally in §4.5| we also illustrate the algorithms in the context 
of sparse tomographic reconstruction of a walnut data set, [7], demonstrating the more 
general use of the approach beyond deblurring of noisy data. Our conclusions are 
presented in j|5} It is of particular interest that our analysis applies for both over and 
under determined systems of equations and is thus potentially of future use for other 
algorithms also in which alternative regularizes are imposed and also require repeated 
Tikhonov solves at each step. Further, this work extends our analysis of the UPRE in 
the context of underdetermined but small scale problems in [28] 29], and demonstrates 
that IRR can be applied for projected algorithms. 


2. Regularization parameter estimation 

In order to use any specific regularization parameter estimation method for the projected 
problem it is necessary to understand the derivation on the full problem. We thus provide 
a brief overview of the derivations as needed. 


2.1. Unbiased Predictive Risk Estimator 

The predictive error, Pf u ii(x(a)), for the solution x(a), is defined by 

Pfuii(x(a)) = 4x(a) - b ex = AA\a) b - b ex = (A(a) - I m ) b ex + A(a)r), (11) 

where A(a) = AA'(a) is the influence matrix. The residual may also be written in 
terms of the A(a) as 

r fu ii(x(a)) = (A(a) - I m )b = ( A(a ) - J m )b ex + (A(a) - I m )r]. 


( 12 ) 
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In both equations the first term is deterministic, whereas the second is stochastic due 
to noise vector rj. To proceed we need the Trace Lemma e.g. [30], Lemma 7.2]. 

Lemma 1. For deterministic vector f, random vector 77 with diagonal covariance matrix 
Cfj, matrix F, and expectation operator E 

E(\\f + Fr 1 \\l) = \\f\\l+tr(Cr 1 F T F), 


using tr(A) to denote the trace of matrix A. 

Applying Lemma [I] to both ( [IT] ) and (12) with the assumption that C77 = / m , due 
to whitening of noise rj, and using the symmetry of the influence matrix, we obtain 


£(||p ful i(x(a))||^) = ||(i4(a) - Im) b ex ||l + tr (A T (a)A(ot)) (13) 

^(||r ful i(x(ct))|||) = ||(A(a) - I m )Ky\\l + tr((A(a) - I m ) T (A(a) - I m )). (14) 


Here £ , (||pf u n(x(a))|| 2 )/m is the expected value of the risk of using the solution x(a) to 
predict b ex . The first term on the right hand side in each case cannot be obtained, but 
we may use i 7 (||rf u u(x(a))|| 2 ) ~ ||rf u n(x(a ))||2 in (14). Thus using linearity of the trace 


and eliminating the first term in the right hand side of (13) gives the UPRE estimator 
to find a opt 


Oopt = argmin{t/(a) = ||(.4(a) - / m )b||l + 2tr(A(a)) - m}. 


(15) 


Typically, a opt is found by evaluating (15) for a range of a, for example by the SVD see 
e.g. Appendix Appendix B with the minimum found within that range of parameter 
values, as suggested in m 3 for the GCV. See also e.g. [29) Appendix, (A.6)] for the 
formulae for calculating the function in terms of the SVD of matrix A. 


2.1.1. Extending the UPRE for the projected problem We observe that we may 
immediately write the predictive error and the residual in terms of the solution of the 
projected problem explicitly depending on the regularization parameter (. Specifically, 
defining the influence matrix (AG t )(C) = AGt{AG t )\C) for the projected solution we 
have 


Pfau(x t (0) = AG f wt(C) - b ex = (AG7)(C)b - b ex 
rfuii(x t (C)) = AG t w t { C) - b = ((AG7)(C) - I m ) b. 


By comparing (16) with (11) and (01 with ( |T2| ) we obtain 

tWC) = II {(AG t ){i) - I m ) b||l + 2tr((AG,)(C)) - m. 


(16) 

( 17 ) 


Now by ([8]) it is immediate that the first term can be obtained without finding x t ( Q. 
For the second term we observe 


(AGt)(C) — AGt((AGt) T AGt + C 2 X) l {AGt) T 

= H t+ 1 B t ((H t+ 1 B t ) T (H t+ 1 B t ) + Clf)-\Ht +l B t ) r 
= H t+1 + ehr'Bj) H ? +1 = H t+ iB t (QHj +1 
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Tr((^)(0) = Tr(i7 t+1 ^(C)^ T + i) = Tr(R(0), 

where the last equality follows from the cycle property of the trace operator for 
consistently sized matrices. Hence 

£M0 = lift (ft(C) - It + i)e{ +1 \\l + 2 Tr(H t (0) - m (18) 


can be evaluated without reprojecting the solution for every ( back to the full problem. 

Remark 2. Although the UPRE function can be found from the projected solution alone, 
it is not clear whether (18) has any relevance with respect to the projected solution, 
i.e. does this appropriately regularize the projected solution, otherwise it may not be 
appropriate to find ( to minimize this function on the subspace. 


The projected solution solves the problem with system matrix B t and right hand 
side vector fte^ 1 = Hj + ,b which also consists of a deterministic and stochastic part, 
Hj +l b ex + Hf +1 r], where for white noise vector rj and column orthogonal H t+ 1 , Hf +1 r] 
is a random vector of length t + 1 with covariance matrix I t+ \. Tims the UPRE for the 
projected problem is 


two = IIA(B<(0 - /<+i)e!‘ +1) ||l + 2 Tr(B,(0) - (t + 1). (19) 


Comparing (18) with (19), it is immediate that minimizing (18) to minimize the risk 
for the projected solution, also minimizes the risk for the full solution with respect to 
the given subspace. 

ft remains to determine whether there is any case in which hncling £ opt also 
minimizes the predictive risk (15) for the full problem. Specifically it is not immediate 
that Uf u u(o; 0 pt) ~ Ufuii(Copt) because a opt is needed with respect to solutions in Range(U), 
not just restricted to Range(G t Vt)- Here matrices V and V t , the column orthogonal 
matrices arising in the SVDs of A and B t , respectively, span the respective right singular 
subspaces. Although in exact arithmetic the large singular values of B t provide a good 
approximation of the large singular values of A, J6] Section 9.3.3], the number of small 
singular values in the spectrum of B t limits how well the full problem will be regularized 
by regularizing the projected problem. Adopting now the statement of full regularization 
of the LSQR as given in [15], namely that the LSQR iterate with t steps effectively 
captures the f-dimensional dominant right spectral space of A, suppose that t is such 
that the singular values of B t approximate the t largest singular values of A with the 
natural order so that necessarily for t <t *. Equivalently this requires that t* 

is close to t and that the spectrum of B t contains no singular value approximating a very 
small spectral value of A. It is shown in m Theorem 2.3] that this requirement is more 
likely satisfied for severely and moderately ill-posed problems, than for mildly ill-posed 
problems. Further, in such cases the LSQR solution on the space of size t approximates 
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the TSVD solution of the full problem, namely the solution of the full problem with 
filter factors </>*(«) = 0 for i > t. Then, 


Tr(A(a)) = | t — a 2 ^(of + a 2 ) 1 j + I (m* — t) — a 2 ^ (cr 2 + a 2 ) 1 
v, i= 1 ) V i=t +1 / 

ft \ t 

t - a 2 J^(cr 2 + a 2 Y l ) « t-a 2 ^(y* 2 + « 2 ) _i = Tr(R t (a)). (20) 


i= 1 


i=l 


Thus, in the situation in which the LSQR iterate provides full regularization, 
determining £ opt to minimize (19) will yield a opt which is optimal for the filtered 
truncated SVD (FTSVD) solution of the full problem. This observation also follows 
Theorem 3.2 [16] which connects the use of the TSVD of B t for the solution with the 
solution obtained using the TSVD of A. To summarize: 


Remark 3 (UPRE). Ift is such that 4>i(a) ~ 0 fori > t, and such that the LSQR iterate 
provides full regularization so thatTx(A(a )) ~ Tr (B t (a)) andR&nge(G t V t ) approximates 
Range(V)), then C opt ~ a opt when obtained using the UPRE. Further, the estimate is 
found without projecting the solution back to the full space, namely by minimizing (19). 


When the LSQR does not provide full regularization , denoted as partial 
regularization in ra. the above result will not hold, and B t captures the ill-conditioning 
of A through the inclusion of inaccurate small singular values in the spectrum of B t . 


2.2. Morozov discrepancy principle 

Although it is well-known that the MDP always leads to an over estimation of the 
regularization parameter, e.g. [IB], it is still a widely used method for many applications, 
and is thus an important baseline for comparison. The premise of the MDP, ra. to find 
a is the assumption that the norm of the residual, || r fu ii(x(o:)) ||| follows a y 2 distribution 
with 5 degrees of freedom, ||r fu n (x(cr) )||| = 5. Heuristically, the rationale for this choice 
is seen by re-expressing ((7|) 

r ful i(x(a)) = dx(a) - b = A(x(a) - x ex ) - rj, 


so that if x(a) has been found as a good estimate for x ex , then the residual (j7|) should 
be dominated by the whitened error vector rj. For white noise ||r 7 ||| is distributed as 
a x 2 distribution with m degrees of freedom, from which -E'(|| 771 | 2 ) = m, with variance 
2m. Thus we seek a residual such that 6 = vm using a Newton root-finding method, 
see Appendix Appendix B where we take safety parameter v > 1 to handle the well- 
known over smoothing of the MDP. Applying the same approach for the projected 
residual yields the noise term Hf + 1 r] replacing 77 . Thus the degrees of freedrom are 
reduced to t + 1 and we seek a residual such that S = v(t + 1). A number of other 
suggestions for a projected discrepancy principle have been presented in the literature, 
but generally imply using 6 pz 7 ,; 117711 | pz || 77 1||) pz vm dependent on the noise level 
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of the full problem, e.g. [9} [HU [25], with v > 1. It is reported in [9], however, that 
while the theory predicts choosing v > 1, numerical experiments support reducing v. 
Alternatively this may be seen as reducing the degrees of freedom, instead of reducing 
v. We deduce that the interpretation for finding the regularization parameter based on 
the statistical property of the projected residual in contrast to the full residual should 
be important in determining 5. 

Remark 4 (MDP). For the MDP the degrees of freedom change from m to t + 1 when 
the residual is calculated on the full space as compared to the projected space. Thus ([ opt 
is not a good approximation for a opt when obtained using 5 pro j as a guide for the actual 
size of the projected residual. For the full regularization the degrees of freedom for the 
full problem are reduced and again £ opt ~ a opt . 


2.3. Generalized cross validation 


Unlike the UPRE and MDP, the GCV method for finding the regularization parameter a 
does not require any information on the noise distribution for rj. The optimal parameter 
a is found as the minimizer of the function 


Gf u ii(o;) 


l|rfuii(x(a))l|| 
(tr(A(a) - I m )) 2 ’ 


( 21 ) 


ignoring constant scaling of Gf u n(a) by n, [5]- The obvious implementation of the GCV 
for the projected problem is the exact replacement in (j2l| using the projected system 


GWC) 


l|r P roj(w t (C))l|| 

(tr(R,(C)-/ m )) 2 ’ 


( 22 ) 


as indicated in [16]. It was recognized in [2j Section 5.4], however, that this formulation, 
tends to lead to solutions which are over smoothed and as an alternative the WGCV 
was introduced, dependent on parameter oj, 


G pro j (C, k-0 


II r P roj(w f (0)||| 

(tr(cvB t (() - I t+ i)f 


Experiments illustrated that oj should be smaller for high noise cases, but in all cases 
0 < oj < 1 is required to avoid the potential of a zero in the denominator. The choice 
for oj was argued heuristically and an adaptive algorithm to find oj was given. 

Consider now the two denominators in (21) and (22). First of all by (20) it is not 
difficult to show, for 0 < oj < 1, that 


0 > Tr (B t (a) - I t+ i) > Tr(A(a) - / m ), 

so that G pro j(a) > G'fuii (a) and a chosen to minimize the projected GCV will not 
minimize the full GCV term. For the weighted GCV, however, 

t i 

Tr(/ i+ i — ojB t {()) = (1 + t — ojt )) + o j(,~ ^ 2 , y 2 

i =i + ^ 


( 23 ) 
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and for the full regularization in which we approximate the full TSVD, ~ 0 for 

i > t, 


Tr (I m — A(ct)) — (m — m*) + a 2 


. af + a 2 

2=1 1 


(m 


-t) + a 2 ^2 


. cr + ad 
2=1 1 


(24) 


Factoring for m — t ^ o and (t + 1 — Lot) ^ 0 in (24) and (23), respectively, gives the 
scaled denominators 


(m — t ) 



1 

m — t 


E 



and (1 +1 


cot) 



co 

1 + t — cot 


E 



Ignoring constant scaling the denominators are equilibrated by taking 


1 


m — t 


——-- yielding u 

1 + t — cot 


1 + 2 
m 


< 1. 


This result suggests that we need (t + l)/m < co < 1 in order for (C opt to estimate a opt 
found with respect to the projected space. 

Remark 5 (GCV). Taking co = (t+l)/m gives the regularization of the FTSVD of the 
full problem, in the case of the LSQR with full regularization, namely for moderately 
and severely ill-posed problems as defined in m 


3. Simulations: One Dimensional Problems 

To illustrate the discussion in ^2] we examine the solution of ill-posed one dimensional 
problems with known solutions. In all experiments we use MATLAB@2014b and 
test problems phi Hips and gravity which are discretizations of Fredholm integral 
equations of the first kind provided in the Regularization toolbox, m Problem gravity 
depends on a parameter d determining the conditioning of the problem, here we use 
d = 0.75 yielding a severely ill-posed test problem. In contrast problem phillips is 
moderately ill-posed and the Picard condition does not holdJ§] Simulations for over 
and under sampled data are obtained by straightforward modification of the relevant 
functions in nn. We discuss representative results obtained for the undersampled case 
with m = 152 and n = 304, for which the condition number of A is 4.05e + 05 and 
3.38e + 17, for phillips and gravity, respectively. The function bidiag.gk associated 
with the software for the paper [33] , is used for finding the factorization (J6]) with full 
reorthogonalization against of all basis vectors (the default). 

For a given problem defined by (jl| without noise, noisy data are obtained as 

b c = b ex + rf = b ex + ?/||b ex || 2 e c , (25) 

§ The continuous and discrete Picard condition are well-described in the literature, e.g. m ■ Basically 
the Picard condition holds if the absolute values of the coefficients of the solution decay on average 
faster than the singular values. 
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for noise level rj and with e c the c th column of error matrix £ that has columns sampled 
from a random normal distribution using the MATLAB function randn(m, nc). The 
signal to noise ratio for the data given by 

BSNR(? 7 ,m) = 201ogl0 (-—~ —201ogl0 h]y/rn), (26) 

Vl|b c -b ex ||y 

is independent of the test problem. In particular BSNR(.005,152) « 24.2. Example 
simulation data are shown in Figure [l] for noise levels r) = .005 for each test problem 
with m = 152 and in each case for 5 samples of the noise, b c , c = 1: 5. In all simulations 
the matrices and right hand side data are weighted by the diagonal inverse square root 
of the covariance matrix, assuming colored noise. 




(a) phillips 


(b) gravity, d = .75 


Figure 1 . Illustrative test data for noise level g = .005 for sample right hand side 
data b c , c = 1: 5, with m = 152 and n = 304. Exact solutions are given by the solid 
lines in each plot. 


3.1. Spectra of A and B t 

Figure [2] illustrates the spectra of the matrices A and B t for each test problem for the 
pairs (m,n) = (152,304), for t = 1: 10: 91, and with noise level p = .005 used in the 
calculation of B t . For phillips there are clear steps in the singular values at indices 9, 
12, 16, 19 and 22, and the problem is only moderately ill-posed. In contrast, gravity is 
severely ill-posed, the singular values decay continuously and exponentially to machine 
precision. Because gravity is severely ill-posed the LSQR iteration quickly captures 
the dominant right singular subspace for small t. On the other hand, for phillips, 
the slower decay of the spectrum and the generation of small Ritz values introduces 
inaccurate small singular values into the spectrum of B tl as is seen by the departure of 
the spectrum of B t from the spectrum of A. This may present difficulty for estimating 
the regularization parameter using the presented approaches, unless t is small. 
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Figure 2. Plots of singular values against the index of the singular value for matrix 
B t for increasing t, t = 1 : 10 : 71 compared to the first 71 singular values of matrix A 
for underdetermined cases m = 152 and n = 304. 


3.2. Estimating t: the subproblem size 

To determine the size of the subproblem we examine the approach suggested in HI 
(3.9)] for determining the appearance of noise in the subspace. Denoting the diagonal 
components of B t by 0j, j = 1 :t and the subdiagonal entries by /3j, j = 2:t + 1, 
the cumulative ratio, p(t) = rij=i (0 j//3j+i), shows the impact of /3j + 1 approaching 
the precision of the algorithm as t increases. For large enough t and exact arithmetic 
f3 t + 1 -C 9 tj [Hj. Noise is identified as entering at optimal iteration f opt _ p , given by 


topt-p = nrin{arg max (p(t))} + 2. 


(27) 


Here t m ; n is problem dependent and chosen to ensure that noise has entered the problem 
and we take 2 steps beyond t min . There is little difference in the characteristic oscillation 
of p{t) considered in our examples; noise enters quickly and t m in = 3 is already sufficient. 
Figure [3] illustrates p{t) for the test problems with 5 samples of the noisy data. p{t) 
correctly identifies the point when the singular values reach machine precision for 
problem gravity. We point out that the noise levels used in these examples, and 
the subsequent simulations, are substantially larger than the noise levels used in [ 13] . 
for which the optimal choice of t is thus correspondingly larger. If we run our examples 
with less significant noise we do obtain results consistent with [13]. It was already noted 


in H, that ( J27J ) cannot be used when the discrete Picard condition does not hold, e.g. 
for phillips. An alternative method for identifying the subspace size is to minimize 
the GCV function for the TSVD, [3] 3.12], 


/ ^max 

S(t) = J] |ufb| 2 . (28) 

rmax l ) 

G(t) depends on the choice of f max , i.e. the size of the largest subspace considered, in 
contrast p{t) depends on the selection of t min but is independent of f max . To assess 
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the impact of the correct choice of t we also examine solutions obtained with larger 
subspaces. 




(a) phillips (b) gravity d = .75 


Figure 3. Noise revealing function p(t) for the two test problems for underdetermined 
cases with noise level y = .005 as in Figure [2] 


3.3. Regularization Parameter Estimation 

In implementing the parameter regularization we test the GCV with oo = 1 as 
compared to WGCV with u> = (t + 1 )/m, and the MDP with 5 — rn as compared to 
projected MDP (PMDP) 5 = (t + 1). Estimates using GCV, WGCV, MDP, PMDP 
and UPRE are obtained by calculating the relevant functions for the same set of 
regularization parameters, and then minimizing within the region of the optimum as 
used in Regularization Tools for the GCV. For all tests the regularization parameter 
yielding a minimum error (denoted in results by MIN) for the projected space is also 
found by calculating the regularization parameter at 1000 logarithmically sampled points 
between 71 and max( 10 _ 14 7 i, 7 t)|j|] 


3-4- Evaluating the results 


Contaminated data b c are generated using (25) for c = 1: 50 yielding solutions x(f, c) 
for problem size t with t = [3: 20,24: 5: 74], The relative error (RE) of the solution 
with respect to the known true solution is given by 


RE(t,c) = || x(t,c) x ex 11 2 /11 x ex 11 2 • 

The results in Table [l] are the average RE over all 50 samples at the reported average 
t op t-p and the minimum RE over all samples and all t. These results summarize the 

|| In practice one would not take such a large selection of regularization parameters, but in tests we 
found that the minimization step in the UPRE may yield reduced error if the optimum is not found 
by sampling over a sufficiently fine distribution for the regularization parameter. 
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graphs of the errors for the same cases in Figure [4j Immediately we observe that 
these one dimensional problems generally need rather small subspaces to obtain optimal 
solutions with respect to the full problem. They also show that the estimators, other 
than the PMDP and WGCV are quite robust to the choice of t, away from the optimal 
value. As suggested by the spectra of B t and A shown in §3.1 the WGCV is robust 
for gravity, for which the LSQR gives the full regularization, but not for phillips. 
Parameter dependent PMDP is not robust in either case. Also the GCV has minimal 
error for a larger subspace, and thus estimating i opt via t op t-p is not effective. On the 
other hand, using (28) t opt _g is larger, 6 and 74 respectively for the two problems, for 
which a smaller error is achieved by GCV, .13 and .23, respectively. 


Table 1. Average RE over 50 samples for problem size m = 152 and n = 304. t m - m = 3 
and with average t op t_ p as given. Results are illustrated in Figure [4} _ 



MIN 

MDP 

UPRE 

GCV 

WGCV 

PMDP 

Average RE: Average t opt p = 5 

phillips 

0.16 

0.16 

0.16 

0.17 

0.16 

0.16 

gravity d = .75 

0.17 

0.66 

0.52 

0.35 

0.49 

> 1 


Minimum 

TE (average t for the minimum) 

phillips 

0.06 (9) 

0.07 (8) 

0.07 (7) 

0.06 (24) 

0.07 (7) 

0.07 (7) 

gravity d = .75 

0.15 (4) 

0.27 (4) 

0.21 (4) 

0.22 (54) 

0.22 (4) 

0.22 (4) 




(a) phillips 


(b) gravity, d = .75 


Figure 4. Average RE over all samples against t for the underdetermined case: 
in = 152 and n = 304 : Noise level y = .005. 

The results confirm the expectation of the analysis on the performance of the 
techniques dependent on the degree of ill-posedness of the problem, gravity is more 
severely ill-conditioned and the WGCV and UPRE estimators give robust solutions 
independent of t, improving on the GCV and MDP solutions. 
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4-1. Image Deblurring 


We consider image deblurring problems, grain and satellite, of size 256 x 256 from 
RestoreTools [IS] . Our main aim here is to first demonstrate the use of the regularization 
techniques PMDP, WGCV and UPRE for increasing t. and then to examine a stabilizing 


technique using an IRR, [4.4 Results without IRR are presented for completeness in 


and with IRR in f 4.4.2 


For contrast with the results presented in [18] we use noise levels g = .00039 and 
.00019 in (25) which corresponds to noise levels v = 10% and 5%, respectively, in pEJ 
with ||? 7 c ||2 = ^11b ex 11 2 , yielding v = g y /m. These correspond to BSNR 20DB and 26DB 
as calculated by d26|) For immediate comparison with [TS] we indicate the results 


using the noise level v rather than g. In Figure [5] we give the true solution, blurred and 
noisy data and the point spread function. 



(a) True grain 


(d) True 

satellite 



(b) Contaminated, 
v = .1 



(e) Contaminated, 
v = .1 



(c) Point spread 
function 



(f) Point spread 
function 



Figure 5. Data for grain and satellite images with blur by the given point spread 
function and noise level 10%, corresponding to v = .1. 


4-2. Algorithm, Details 

In finding the restorations for the data indicated in Figures [5(b) and |5(e) we note first 


that the matrices A for the PSFs indicated in Figures 5(c) and 5(f) do not satisfy the 


For comparison with the results in [3] we note that there the BSNR is calculated using a definition 
in [19], and their results with 10DB and 25DB according to that definition yield 19DB and 34DB with 
([26]), resp. i.e. approximately 11% and 2% noise. Hence results here for 10% may be approximately 


compared with the 10DB results in [3]. 
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Picard condition. As illustrated in Figure [6j pit) does not show the increase within 
the shown range of t as is clear for p{t) with large t obtained for the one dimensional 
examples in Figure [3j The spectra for increasing t. shown in Figure [7] also demonstrate 
that these problems are only mildly to moderately ill-posed so that the LSQR iterate 
does not adequately capture the right singular subspace. Still, p(t) attains a minimum 
within the shown range for t and then exhibits a gradual increase. This suggests that 
noise is entering the data after the minimum and that one may use 

Apt— min = argmin(p(t)) + 2, 


where again we advance 2 steps under the assumption that noise enters after the 
minimum. In Figure [6] the vertical lines indicate the positions of t opt _ p , t opt _g and 
Apt— min • For grain, A P t-g becomes quickly independent of the number of terms used, 
already stabilizing at Apt-6 = 27 with just f max = 50 terms, with no change even out 
to a maximum size of t max = 250 in the calculation. For satellite Apt-6 is less stable 
only reaching 32 when t max = 100 terms are used in the estimation, but increasing to 96 
if Aax = 250 terms are imposed. Stability in the choice of A p t-p with respect to f min also 
follows lack of stability in choice of Apt-6, suggesting that it is preferable to use Apt— min - 
In our experiments we have deduced that it is important to examine the characteristic 
shape of p[t) in determining the optimal choice for the size of the subspace, and will 
show results using Apt-min, Apt-p and Apt-6- 

The range for the regularization parameter is also important as is indicated through 
the windowing approach based on (28), jT]. From Figure[7]it is clear that LSQR iteration 
only provides a partial regularization for either problem and that B t only captures a 
portion of the spectrum. Thus we use a single window defined by t* and apply a 
FTSVD for the solution which is dominant for the first t* terms, i.e. with filter factors 
4>i(() ~ 1 for i < t* and <fri(C) —* 0 for i > t*. With ( = ryj*, <pt*(C) = 1/(1 + t 2 ) < 1. 
In our results r = .1, t* = max(A P t-p, Apt-e), and we impose ryt* < C < 7i? f° r the 
range of (. For the minimal (MIN) solutions, the range is adjusted to 1CF 1 " 5 < £ < 1, 
consistent with the range for the regularization parameter used in |3]. This range is 
scaled by the mean of the standard deviation of the noise in the data, consistent with 
the inverse covariance scaling of the problems. In finding the MIN solution the range is 
sampled at 100 logarithmically sampled points. 


4-3. Results 

For quantitative measurement of a given solution as compared to the known solution 
we again use RE. Other possibilities include using the signal to noise ratio, which is 
directly related to RE, and the mean structural similarity index (MSSM) suggested in 
|32j . We found in our experiments that high MSSM corresponds to low RE and thus 
providing these results delivers little in terms of further assessment of the algorithms for 
image deblurring. The REs using the regularization parameter estimators in contrast to 
MIN and PROJ are illustrated in Figure [8] for restoration of the images in Figure [5(b) 
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Figure 6. Noise revealing function p(t) for the data illustrated in Figures [5(b)| and 
5(e) | with t m i n = 25. Here the dashed-dot vertical line corresponds to the location of 
topt-p, the solid line with symbol to t op t~g and the solid line to t op t-min- 




Figure 7. Plots of singular values against the index of the singular value for matrix 
B t for increasing t, t = 1 : 15 : 121, t = 188 and t = 250. 


and Figure 5(e)[ The results are consistent with the literature in terms of the semi 
convergence behavior of the LSQR and also confirm the increase in error seen with 
GCV without the weighting parameter. Results obtained with WGCV, PMDP and 
UPRE are consistent and verify the analysis in Section T3, providing a stable solution 
for increasing t. Solutions found at the noted t opt for UPRE as compared to the optimal 
solution with minimum error are illustrated in Figure [9] for problem grain. Results for 
the satellite image are similar. Overall the results demonstrate that the restorations 
are inadequate at this level of noise. 


4-4- Iteratively Reweighted Regularization 

Iteratively reweighted regularization provides a cost effective approach for sharpening 
images e.g. [3Tj, and has been introduced and applied for focusing geophysical inversion, 
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Relative Error Relative Error 




(a) grain RE 


(b) satellite RE 


Figure 8. RE with increasing t with regularization parameter calculated by the 


The solid line in each case is the solution with projection and without regularization. 


different regularization techniques for examples illustrated in Figures |5(b)| and 5(e' 



Figure 9. Solutions for noise level 10% for grain using UPRE to find the 
regularization parameters and comparing the solutions obtained for t opt _ p: f 0 pt-min 
and iopt-e as compared to the solution with minimum error, MIN. 


in this context denoted as minimum support regularization, [23] [27} 28} 2!). [33]. 
Regularization operator D is replaced by a solution dependent operator D^ k \ initialized 
with D^ ]) = I and x (0) = 0, yields iterative solution x^' +1 ^(a). For k > 0, with 

(U W )« = ((xf> - xf- 1 ') 2 + p)~V\ 

where j3 > 0 is a focusing parameter which assures that D^ is invertible. Immediately 

(n«)y = ((xf>-xf- i y+/? 2 )-^. 

Thus we can use Q with system matrix A ^ to obtain the iterative 

solution x^ fe+ b, k > 0. Furthermore, it is straightforward to modify the algorithm for 
calculation of the factorization © for the left and right preconditioned matrix A^ k \ 
also noting for the specific preconditioners that operations with the diagonal matrix are 
simple component-wise products. 
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4-4- 1- Comments on the parameter (3 Suppose that = x- fc_1 \ for i e X and (3 — 0, 
then (D~ l )a = 0 for i 6 I. Then rather than solving AD~ x z « r, we solve the reduced 
system Az « r, where A is A but with column i removed for i G 1 and all other columns 
scaled by the relevant diagonal entries from (ZW 1 )^. Matrix A is of size m x h where 
n — n— \X\ and vector z is vector z with entries iel removed. With regularization z is 
obtained as the solution of (A T A + a 2 In) z = A 1 r. Thus y = D~ x z, where D is obtained 
from D with the same diagonal entries i E X removed. The update for x is therefore 
obtained using ^ with entries x.[ k \a) = xf 1 + y*, for i ^ X and xf'^(a) = x- fc_1 \ for 
iel. Forthwith we use (3 = 0 and factorize the reduced system with system matrix A. 


4-4-2. Algorithmic Details for IRR The approach for the iteration requires some 
explanation as to how the range of t is obtained at IRR iterations k > 0. Noise 
revealing function p( fe )(£( fc_1 ), £) depends on the subspace size from the prior step 

k, and current subspace size t. Further, t op t- P , % P t-min and t opt -g are all dependent on 

t (k-i) 

as well as (A fc and tml x (t (k x )), i.e. given a specific subspace size at '- 1 
the minimum and maximum sizes to use at step k need to be specified. Because we 
anticipate that further noise enters with increasing k , we expect t^ n < and that 

tmlx(t) < With these constraints the cost of an IRR step will be less than the 

first step k = 0. Examination of i) is useful in identifying the constraints on 

C k \ For each iteration the range for ( is constrained using the current singular values, 
by T 7 j < ( < 7 i, where at step 0, t* = max(t op t_p, t opt -g) and t* = t opt _ p for the IRR 
updates. We will examine the choices for the case with 5% noise. 


44.3. Results with IRR for 5% noise Here we only report results obtained using the 
UPRE as compared to the optimal solutions. These results are indicative of IRR 
implemented with the other methods for finding the regularization parameter. To 
examine the process carefully in one case, we focus on problem grain with 5% noise. 
Function p{t) at the first step k = 0 does not differ significantly from the case with 
10% noise, shown in Figure |oJ We use a maximum subspace size with t = 100 for the 
calculation of f op t-g- Figure 10(a) shows t) for the choices of t, t op t _ p , f op t-mm 

and t opt _g. It is clear that t) is almost independent of f or the first steps, 

but that noise enters for k = 4. This is also reflected in the RE in Figure [10(b) the RE 
stabilizes for increasing t , and decreases for the first three steps of IRR, but increases 
at step 4. The impact of the IRR is similar for problem satellite. 

The RE for the two simulations, contrasted with MIN are detailed in Table [2} The 
IRR stabilizes the solutions leading to results comparable to those of MIN. Moreover, 
it is also clear that one may not conclude that Ending t opt using t op t-min is preferable 
to using t opt _g or t op t _ p . The results in Table [2] indicate that using t opt _ p leads to 
best solutions in one case, and t opt _g in the other, although effectively the quality is 
comparable. Provided that the solutions are stabilized with the IRR, improvements in 
the solutions are obtained in a limited number of steps using IRR with relatively small 
subspaces for the iterative updates. 
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(a) k > 0 p{t) 



Figure 10. Demonstration of determining the projected problem size with IRR 
iterations for problem grain with 5% noise.In Figure 10(a) p^ k \t) for increasing k, 
for k = 1: 4 left to right and then above and below. In Figure [T0(b)| the dashed-dot 
vertical line corresponds to the location of t opt _ p , the solid line with symbol to t op t-g 
and the solid line to Upt-min- 


Table 2. RE for problem grain and satellite with 5% noise corresponding to 
v = .05, for solutions found using different selection of t 0 pt as compared to the optimum 
found with UPRE and overall optimum selected over the range for 



grain with f op t-min = 21 and t opt -g = 27 

Iteration 

^opt—min 

^opt —Q 

Min for UPRE 

Overall Min 

1 

0.4248 

0.4074 

0.4030 

0.3993 

2 

0.4073 

0.3962 

0.3929 

0.3903 

3 

0.3899 

0.3811 

0.3788 

0.3776 

4 

0.3775 

0.3731 

0.3728 

0.3731 


satellite with t Q pt p = 42 and t 0pt ~g = 69 

Iteration 

^opt—p 

^opt —Q 

Min for UPRE 

Overall Min 

1 

0.3566 

0.3520 

0.3517 

0.3863 

2 

0.3375 

0.3430 

0.3373 

0.3382 

3 

0.3374 

0.3431 

0.3372 

0.3352 

4 

0.3385 

0.3485 

0.3371 

0.3352 


4-4-4- Terminating the IRR iteration The graphs of p(t) with increasing k in 
Figure 10(a) indicate that the properties of t) can be used to determine 

effective termination of the IRR, based on the iteration k when noise enters into 
p( fc )(f( fe_1 ),f). Our experience has shown that the optimal solution in terms of image 
quality is achieved not at the step before noise enters in , t) but two steps 

before. 
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4-4-5. Results for 10% noise In Figure 11 we illustrate how the RE changes with the 


iteration count. The vertical lines in Figure 11(a) demonstrate that using f opt _ p or t opt _g 
makes little difference to the quality of the solution when measured with respect to the 
RE. Example solutions for grain are given in Figure [I2j for contrast with the solutions 
without IRR shown in Figure [9} 



(a) RE grain 10% noise 


(b) RE satellite 10% noise 


Figure 11. RE for problem grain, (satellite) with noise level 10%, in Figures 11(a) 


and |ll(b)| respectively. In each case with increasing iteration for solutions calculated 
using UPRE. The solutions are stable out to k = 2 iterations of IRR. Here the dashed- 
dot vertical line corresponds to the location of t Q p t-p, the solid line with symbol to 
t a pt-g and the solid line to t 0 pt-min- 



Figure 12. Solutions for noise level 10% for grain using UPRE to find the 
regularization parameters and comparing the solutions obtained for different t op t as 
compared to the solution with minimum error, MIN, for IRR at the indicated step 
applied to the solutions in Figure [9| 


4-5. Sparse tomographic reconstruction of a walnut 

To contrast the success of the regularization parameter estimation techniques in the 
context of a 2D projection problem, we present results for the reconstruction of 
projection data obtained from tomographic x ray data of a walnut, used for edge 
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preserving reconstruction in [8j and with the description of the data described in [7j. 
Datasets DataN correspond to resolution N x N in the image, and use 120 projections, 
corresponding to 3° sampling. Data are provided with N = 82, 164 and 328. We use 
resolution 164 with 120 projections, and then downsampled to 60, 30 and 15 projections, 
i.e. angles 3°, 6°, 12° and 24°. Results with resolutions 82 and 328 are comparable. 


Results are presented using the solution at t opt _ p regularized using UPRE, Figures 14- 


15 


Results for 120 projections are almost perfect due to the apparent limited noise 
in the provided data, while the results with 15 projections clearly show the projection 
data. In all the presented results the parameters t opt _ p are determined automatically, 
after manually picking i min = 5 from manual consideration of the plot for p(t), see 


Figure 13 Further, from Figures 13(b) 13(d), it is immediate that in this case the 


LSQR iteration only offers a partial regularization independent of the sparsity and thus 
again it is necessary to identify the window for the usable spectrum. All parameters are 
then estimated in the same way as for the image restoration cases. 



(a) p(t) 




(b) 60 projections 



(c) 30 projections 


(d) 15 projections 


Figure 13. p(t) for increasing sparsity for resolution 164 for the walnut data in 
Figure [13(a) | and the spectra for increasing t, t = 1 : 15 : 91, 150, 200, with increasing 
sparsity in Figures 13(b)fl3(d)| 


Figures [I4[fl5| show results for one set of data at increasing sparsity, compare with 
[8j Figure 6.6 and Figure 6.7], which give results with resolution for N = 128 and 
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256, respectively, and angle separation 2°, 4°, 6° and 12°. The approach in [8] uses 
selected choices for the regularization parameter based on a sparsity argument with prior 
information and seeks to support the use of the sparsity argument for reconstruction 
of sparse data sets. The images exhibit the rather standard total variation blocky 
structures when applied for truly sparsely sampled data. Our results show robust 
reconstructions with the automatically determined solutions, after first examining the 
plot for p(t). 1RR generates marginal improvements in qualitative solutions. To show 
the impact of the correct choice of i opt on the solution we show a set of results at 
iteration k — 0 using £ min = 18 in Figure [16J with the positive constraint. The UPRE 
yields solutions qualitatively similar to the case with t min = 5. In the examples here 
we do impose an additional positivity constraint on the solutions at each step, before 
calculating the iterative weighting matrix. 

The results demonstrate that the projected problem with automatic determination 
°f Copt can be used to reconstruct sparsely sampled tomographic data, provided that 
an initial estimate for t mm is manually determined by consideration of the plot of p(t). 
Further, IRR stabilizes the solutions. For the sparse data sets the solutions do not 
exhibit the characteristic blocky reconstructions of total variation image reconstructions, 
as seen in [8]. 



(a) UPRE: k = 0 


(b) UPRE: k = 1 


(c) UPRE: k = 2 




Figure 14. Solutions at increasing iterations for walnut with resolution 164 x 164, 
tmin = 5, f 0 pt-p = 8, positivity constraint and sampling at 6° intervals, 60 projections. 


5. Conclusions 

We have demonstrated that regularization parameter estimation by the method of 
UPRE can be effectively applied for regularizing the projected problem. Our results 
also motivate a choice for the weighting parameter in the WGCV. These results apply 
for the concept of full regularization which was recently introduced in [13]. It was argued 
there, however, that in the case of full regularization, no additional regularization of the 
iterate is required, because the LSQR iterate approximates the TSVD solution of the full 
problem. It is known, however, that even with truncation additional filtering through 
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(a) UPRE: k = 0 



vJHn \j 


a/ 


(b) UPRE: k = 1 (c) UPRE: k = 2 


Figure 15. Solutions at increasing iterations for walnut with resolution 164 x 164, 
tmin = 5, fopt-p = 8, positivity constraint and sampling at 12° intervals, 30 projections. 



(a) UPRE: 60 (b) UPRE: 30 (c) UPRE: 15 


Figure 16. Solutions for walnut with resolution 164 x 164 without IRR, i.e. at k = 0, 
with £ m i n = 18 yielding f 0 pt-p = 25, 29, and 40, automatically determined for sampling 
intervals 6°, 12° and 24°. 


the use of a FTSVD solution is required. Moreover, while the projected solution may not 
actually need regularization, without prior knowledge of the optimal subspace size f, it is 
difficult to determine the point at which semi-convergence contaminates the projected 
solution. Hence effectively regularizing via the hybrid LSQR is still necessary. Our 
results demonstrate that the regularization estimators will fold effective regularization 
of the FTSVD solution of the full problem. In the case of the partial regularization , 
i.e. in the cases in which a small Ritz value appears before the LSQR iterate has 
captured the dominant SVD components of A, as discussed in [15], regularization on 
the projected problem will not adequately regularize the full problem. Here we handled 
the situation in which LSQR does not sufficiently capture the dominant singular space 
by restricting the range for the regularization parameter dependent on the singular value 
for 7 topt , so as to effectively use a FTSVD solution of the projected problem. For future 
investigation we suggest that it is important to identify the extent to which the LSQR 
iterate captures a right singular subspace for A, and to then potentially use more than 
one regularization parameter, one which is chosen to regularize the dominant terms of 
the spectrum, and one which handles the small singular values of B t , extending the 






Iterated Lanczos hybrid regularization 


25 


windowed regularization parameter techniques [T]. 

This work also demonstrates that edge preserving regularization, via iterative 
reweighting, can be applied to stabilize regularized solutions of the projected problem. 
Our results suggest manual estimation of a minimal subspace size can then lead to 
useful estimates for an optimal projected space, with the use of the IRR leading to 
improvements in the solutions when i opt is found by different methods, including the 
use of t 0 pt-p, Opt—min and t opt _g, hence making the determination of this Opt less crucial 
in providing an acceptable solution. Future work on this topic should include extending 
use of more general iteratively reweighted regularizes accounting for edges in more than 
one direction in conjunction with the projected solutions. 


Appendix A. Expansion Solutions 


Suppose the SVD of matrix A, A e 7Z mxn , is given by A = UY,V T , where the singular 
values are ordered G\ > cr 2 > • • • > er m * > 0 and occur on the diagonal of £ E IZ mxn with 
n — m zero columns (when m < n) or m — n zero rows (when m > n), and U E 
and V E IZ nxn are orthogonal matrices, [6j. Then 


XI OL ) — 


£ 

1=1 


G- 


ujb 


af + a 2 


- V ; = 


y^d>i(a)—Vi, bi = ufb. 
“ CTi 


(A.l) 


i =1 


For the projected case B t e 7^^ +1 ^ xt , i.e. m > n, and the expression still applies with 
||b|| 2 ef +1) replacing b, ( replacing a, 7 * replacing cij and m* = t in (A.l). 


Appendix B. Regularization Parameter Estimation 

All formulae apply using the SVD for B t replacing that for matrix A. 


Appendix B.l. Unbiased Predictive Risk Estimator 
The TJPRE function is given by 


U(a) = Yi 

i= 1 


a 2 a~ 2 + 1 



— m. 


Appendix B.2. Morozov Discrepancy Principle 
The MDP function is given by 



= 5 . 


For the projected case <5 pro j replaces 5. 
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Appendix B.3. Generalized Cross validation 

Using the SVD for B t the WGCV function is given by 


G(C,w) = 


El! 


7 2 <U 2 +1 


b 2 , spt+1 b 2 

u i ' Z^i=t +1 u i 


((1 + t - wt) + ujC 2 Ei=i ) 
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